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INTRODUCTION 


This  workshop  on  "Optical  System  Assessment  for  Design  and  Simulated 
Annealing"  represents  the  twenty-third  of  a  series  of  intensive  academic  / 
government  interactions  in  the  field  of  advanced  electro-optics,  as  part  of  the 
Army  sponsored  University  Research  Initiative.  By  documenting  the  associated 
technology  status  and  dialogue  it  is  hoped  that  this  baseline  will  serve  all 
interested  parties  towards  providing  a  solution  to  high  priority  Army 
requirements.  Responsible  for  program  and  program  execution  are  Dr. 
Nicholas  George,  University  of  Rochester  (ARO-URI),  and  Dr.  Rudolf  Buser, 
CCNVEO. 


2.  SUMMARY  AND  FOLLOW-UP 


The  meeting  began  with  a  brief  presentation  by  Bob  Spande  of  the  Night  Vision 
Laboratory  who  described  a  variety  of  opticai  systems  that  his  group  was 
involved  in  designing.  These  inciuded  a  number  of  many-element  systems  and 
systems  which  incorporate  diffractive  optics  and  gradient  index  elements. 
Since  the  design  of  such  systems  presents  entirely  new  challenges,  it  was  felt 
that  global  optimization  may  have  the  most  to  offer  in  such  contexts  on  account 
of  the  paucity  of  "ruies  of  thumb"  to  guide  the  designer  in  finding  appropriate 
stalling  points  for  the  regular,  iocal  optimization  algorithms. 

The  second  and  third  presentations  were  given  by  Greg  Forbes  and  Andrew 
Jones.  After  iliustrating  the  impracticality  of  an  algorithm  that  is  guaranteed  to 
find  global  the  minimum  for  problems  such  as  the  design  of  non-trivial  optical 
systems,  Forbes  described  the  ideas  behind  the  simulated  annealing  algorithm. 
He  also  presented  the  concepts  introduced  in  order  to  automate  the 
temperature  control  and  step  generation  components  of  such  an  algorithm  in  an 
adaptive  fashion.  Jones  described  some  of  the  principal  difficulties  that  must  be 
addressed  in  applying  this  algorithm  in  the  context  of  lens  design.  The  most 
important  of  these  being  the  problem  of  dealing  with  the  issues  of  constraint 
enforcement  and  non-linear  transformations  of  the  coordinates  and  merit 
function.  This  presentation  was  concluded  with  a  report  on  applications  of  this 
adaptive  simulated  annealing  algorithm  to  a  number  of  optical  design  problems. 

The  workshop  ended  following  a  discussion  of  potential  joint  projects  and 
collaboration.  It  was  decided  that  it  would  be  of  benefit  and  interest  to  all 
concerned  if  the  adaptive  simulated  algorithm  and  dedicated  hardware  at  The 
Institute  of  Optics  be  applied  to  a  variety  of  optical  design  problems  supplied  by 
NVL  for  comparison  with  their  proposed  systems. 


WORKSHOP  ON  GLOBAL  OPTIMIZATION 
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Towards  Global  Optimization 

with 

Adaptive  Simulated  Annealing 


Greg  Forbes  and  Andrew  Jones 
The  Institute  of  Optics 
University  of  Rochester 
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statement  of  Problem 

Configuration  space  : 

r  =  (xj,  X2, x„). 

Merit  function : 

m. 

Constraints  (  region  of  interest ) : 

Ci(r)  <  0  for  i  =  1, 2, p, 

Ej(r)  =  0  for  i  =  1, 2, q. 

Task : 

Find  the  point  [  inside  the  region  of 
interest  ]  where /(r)  takes  its  lowest  value. 
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Lesson  One 


There  is  no  algorithm  that  is  guaranteed 
to  find  the  global  minimum  in  a  finite  time. 


Simulated  Annealing  Aigorithm 


Repeat: 

•  Take  random  step  of  mean  distance  s 

•  if  downhill  accept  it  as  new  position 

•  if  uphill  "S’  accept  with  probability  e~^'^ 


4 


Concepts  for  Annealing 

Probability  of  accepting  a  step  : 

A  (r,  r')  =  probability  of  accepting  the 
step  from  r  to  r'  given  that  J  =  1  /p. 

_  _  ri,  /(!•')  </(r), 

Step  distribution  : 

S(r,  r')  dt’  =  probability  of  generating  a 
step  to  within  volume  element  dx'  at 
point  r'  given  that  the  start  point  is  r. 

Occupation  density  : 

p(r)dx  =  fraction  of  cases  where 
current  point  is  within  volume  element 
dx  at  position  r.  [  So,  J  p(r)dx  =  1.  ] 

{  Same  set-up  observed  many  times. } 
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Modelling  Annealing 

Effect  of  one  cycle  of  algorithm  : 

If  occupation  density  is  after  n 

cycles,  then 

p[w+l]^^^  _  +  arrivals  —  departures, 

where 

arrivals  =J  p*-"^(r')5(r’,r)A(r',r)  dx', 
departures  =J  p*^”^(r)5(r,r’)A(r,r')  dx'. 

Equilibrium  density  : 

As  n  — >  <50,  p  ”  (r)  tends  towards  a  fixed 
distribution: 

yeq  V' ’  P)  iV(p)  ^  > 

A^(p)  =  J  dx, 

provided  5(r,r’)  =  5(r',r). 


A  simple  merit  function 


Equilibrium  density  plots 


higher  temp 
moderate  temp 
lower  temp 


"Optimal"  temperature  and 

step  schedules 


Heuristic  principle : 

At  every  step,  keep  the  occupation  density 
as  close  to  equilibrium  as  possible. 


...need  to  have  measure  of  distance 
between  distributions,  so  define  one: 

=  y  \?a(r)-pbir)\dx. 
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Measures  of  equilibration 


•  Rate  of  change  of  equilibrium  distribution 
with  temperature 

^  p,^(r;|3  +  Ap)} 

A|i — ^0 1 


•  Reduction  per  step  in  distance  to 
equilibrium:  Suppose  =  pg^(r;P) 

and  temperature  is  changed  to  p+Ap  then 
define 

^{p[«+l](^)^  p^^(r;P  +  Ap)} 
AMO  2){pf"^(r),  pg^(r;P  +  Ap)} 
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Now  can  do  some  algebra  to  find: 

where 

</>:=/  9eqir\^)  fir)  dx. 

Also  have 

J  p^(r;P)^(r)-</>H-j[/(r’)-/(r)]S(r.r'M(r.r')dT-|dT 

1  P)  !/('■)-</ H 

Key  Observations : 

•  It  is  possible  to  evaluate  numerically  any 
such  entity  during  the  annealing  process. 

•  Can  use  S  and  i^to  control  algorithm. 


Restate  heuristic 

Keep  within  a  distance  of  e  of  equilibrium: 
If  is  temperature  at  cycle  n,  require 

Pe^(r;|3„)}  <  e,  for  all «. 


•  Can  now  deduce  that,  at  each  cycle,  P 
should  change  by  no  more  than: 

Ap  = 

NB.  Have  other  checks  on  whether 
cooling  too  quickly  that  can  be  used  as 
governors.  For  example,  it  is  easy  to 
derive  the  following  standard  result: 
d<f>  r  .  ^2 
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n 


Skeleton  of  simple  algorithm 

Input :  merit  function,  constraints,  and  e. 

•  Cycle  at  (3  =  0  ( ie.  infinite  temperature ) 
to  initialize  the  statistics  3^  s,  <f>,  etc. 

•  repeat 

repeat 

•  generate  a  random  step 

•  reflect  from  linear  constraints 
until  have  feasible  system 

•  accept  or  reject  current  step 

•  update  statistics 

•  increase  p  by 

until  p  is  sufficientiy  high. 


Key  points  in  context  of  lens  design : 

•  Efficient  evaluation  of  merit  function. 

•  Explicitly  incorporate  constraints. 

•  Transformations  of  variables  &  merit  fn. 
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Lens  Design  with  Adaptive  Simulated  Annealing 


Andrew  Jones 
Institute  of  Optics 
University  of  Rochester 


Goal: 

To  locate  with  maximum  certainty  the  global  minimum  of  the 
merit  function  for  a  lens  system. 

Merit  function: 

RMS  spot  size,  integrated  over  the  pupil,  field  and  color: 


o  =  [  J  J  1  I  (e/  +  dy  dx  dX  dh] 

h  X  pupil 

Variables: 

Curvature, 

Thickness, 

Entrance  pupil  location, 
etc. 


Region  of  interest  (constraints  on  independent  variables): 


Simple  inequality  constraints: 


c  .  <  c  <  c 

mm  max 

t  .  <t<t 

mm  max 


•  Feasibility  constraints: 

Non-negative  edge  thicknesses 

-(semi-aperture)"^  <  c  <  (semi-aperture)"^ 


•  Equality  constraints: 
u;  =  -l/(2F#) 


Annealing  can  be  applied  in  a  simple  fashion: 

•  generate  curvatures,  thicknesses,  etc.  randomly 

•  discard  systems  which  violate  constraints 

•  evaluate  remaining  systems 


This  leads  to  problems  at  high  T,  when  we  are  trying  to 
sample  nearly  uniformly  over  the  region  of  interest  —  almost 
all  generated  systems  violate  constraints  and  the  annealing 
process  becomes  very  inefficient. 


Improve  efficiency: 

•  transformation  of  independent  variables 

•  reflection  of  step  vectors  from  constraints 


Example  of  transformation: 

Instead  of  using  curvatures  as  variables,  use  transformed 
variables  x: 

c  =  c  .  +  (x  +  1)  (c  -c.)/2  “l<x<l 
where  c  .  and  c  are  the  minimum  and  maximum 

mm  max 

feasible  values  of  the  curvature.  The  constraints  are  now 
“combined”  with  the  independent  variables,  and  the  region 
of  interest  is  now  an  N-dimensional  “box.” 


Example:  F/8  Landscape  lens,  15*  hfov,  160  mm  efl 


Reflection: 

Even  after  transformation,  some  infeasible  systems  may  be 
generated: 

•  trial  point  with  x>  1  orx<-l 


trial  point 


•  constraint  violation  at  last  surface  (dependent  variable) 


Instead  of  discarding  the  infeasible  trial  point,  we  perform  a 
reflection  about  the  constraint  which  is  violated: 


The  function  q(w)  gives  an  estimate  of  the  “feasibility”  of  a 
point  w: 

q(w)  >0  if  w  is  feasible 
q(\v)  <0  if  w  is  infeasible 
q(w)  =  0  if  w  is  on  the  constraint 
The  larger  the  magnitude  of  q(^,  the  “farther”  w  is  from  the 
constraint. 


The  reflection  should  be  performed  so  that  the  equilibrium 
distribution  at  P  =  0  is  uniform. 

Reflection  algorithm: 

1.  Estimate  the  point  where  the  step  vector  crosses  the 
constraint,  v  +  ^Av 

2.  Choose  the  value  of  jx  such  that  V  +  jxCVq(V)  lies 
on  the  linear  approximation  to  the  constraint 

3.  Set  the  new  trial  point  v^  equal  to  vl  +  2jxCVq(v3 

where 

C  =  covariance  matrix  of  step  distribution 

V  =  base  point 

Av  =  generated  step  vector 

V  =  V  +  Av  =  trial  point  (infeasible) 
v^  =  new  trial  point 

If  the  new  trial  point  v^  is  also  infeasible,  it  is  discarded  and 
a  new  step  vector  is  generated  randomly. 


Transformations: 

Monotonic  nonlinear  transformations  of  the  independent 
variables  and  the  merit  function  can  change  greatly  the 
behavior  of  the  algorithm. 

Example: 

y(x)  =  1.5  x"^  -  x^  -  0.1  X  +  0.3  -1  <  x  <  1 


-1.  -0.5  0.5  1.  -1.  -0.5  0.5  1  . 


Transform:  x  =  0.8  (x')^  +  0.2  x’ 


Adaptive  simulated  annealing  on  parallel  processors: 
Adaptive  simulated  annealing  depends  heavily  upon  the 
statistics  of  previously  generated  trial  points  in  order  to 
determine  the  temperature  schedule  and  step  distribution. 
By  using  parallel  processors,  we  can  improve  greatly  the 
quality  of  the  statistics  collected. 


Using  eight  transputers,  we  can  generate  and  evaluate  for  each 
base  point  groups  of  eight  trial  points  until  an  accepted  point  is 
found.  The  trial  points  and  merit  function  values  are  used  in 
formulating  statistics.  The  annealing  process  is  also  much  more 
efficient. 


Experiences  with  adaptive  simulated  annealing 

•  algorithm  has  been  developed  using  “monochromatic 
quartet”  problem  from  ILDC  ’90: 

•  many  local  minima 

•  minima  differ  significantly 

•  most  minima  are  near  constraints 

•  15  dimensions 


•  algorithm  seems  to  work  well  on  polychromatic  systems 
if  proper  glass  choices  are  made  a  priori 

Possible  future  modifications: 

•  run  downhill  optimizer  from  selected  starting  points 

•  develop  method  for  optimizing  glass  combinations 
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Towards  global  optimization  with  adaptive  simulated  annealing 
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Rochester,  NY  14627 


Abstract 

The  structure  of  the  simulated  annealing  algorithm  is  presented  and  its  rationale  is  discussed  A  unifying  heuristic  is 
then  introduced  which  serves  as  a  guide  in  the  design  of  all  of  the  sub-components  of  the  algorithm.  Simply  put,  this 
heuristic  principle  states  that,  at  every  cycle  in  the  algorithm,  the  occupation  density  should  be  kept  as  close  as  possible  to 
the  equilibrium  distribution.  This  heuristic  has  been  used  as  a  guide  to  develop  novel  stq)  generation  and  temperature 
control  methods  intended  to  improve  the  efficiency  of  the  simulated  annealing  algorithm.  The  resulting  algorithm  has  been 
used  in  attempts  to  locate  good  solutions  for  one  of  the  lens  design  problems  associated  with  this  conference,  viz.  the 
"monochromatic  quartet",  and  a  sample  of  the  results  is  presetted. 

1  Global  optimization  in  the  context  of  lens  design 

Whatevw  the  context,  optimization  algorithms  relate  to  problems  that  take  the  following  form:  Given  some 
configuration  space  with  coordinates  r  =  (xi,X2,...,x,)  and  a  mait  fiinction  written  as/(r),  find  the  point  where/(r) 
takes  it  lowest  value.  That  is,  find  the  global  minimum.  In  many  cases,  there  is  also  a  set  of  auxiliary  constraints  that 
must  be  met  so  the  problem  statemott  becomes;  Find  the  global  minimum  of  the  merit  function  within  the  region  defined 

by  (r)  =  0 ,  y  =  1 , 2 . p  and  /y(r)  >  0 ,  ;  =  1 , 2, ...,  q.  This  region  is  referred  to  in  what  follows  as  the  region  of 

interest.  In  the  context  of  Iras  design,  the  coordinates  are  just  the  system  parameters  in  one  form  or  anothra  ( e.g. 
refractive  indices,  thicknesses,  curvatures  or  radii,  etc. ),  the  inequality  ccxistraints  may  take  the  fcxm  of  the  requirement  for 
positive  edge  thicknesses  or  specifying  maximum  glass  thicknesses  on  axis,  and  an  example  of  an  equality  constraint  is  the 
requirement  of  a  specific  focal  length. 

It  is  evident  that  such  a  problem  is  easy  to  state,  however,  its  solution  is  another  matter  —  in  fact,  this  problem  of 
global  optimization  is  generally  intractable.  This  is  readily  appreciated  by  considering  even  the  simplest  case  illustrated 
schematically  in  Fig.  1  where  a  smooth  one  dimensional  merit  function  is  represented.  Even  if  the  region  of  interest  is 
finite  ( i.e.  thrae  are  inequality  constraints  of  the  form:  x  -  Xo  >  0,  and  Xj  -  x  >  0  ),  it  is  clear  that  by  making  the  steep  dip 
in  the  function  narrower,  it  is  possible  to  make  the  problem  of  locating  the  global  minimum  arbitrarily  difficult. 

In  practice,  as  a  consequence  of  certain  bounds  associated  with  realistic  problems,  the  task  of  locating  the  gbbal 
minimum  is  feasible,  in  principle.  In  most  cases,  howevra,  the  effort  required  to  find  the  minimum  is  truly  formidable. 

For  example,  imagine  that  the  problem  illustrated  in  Fig.  1  represents  a  lens  design  problem  where  the  variable  is,  say,  a 
thickness  limited  to  the  interval  0  to  100mm.  It  is  clear  that  it  is  possible  to  sample  the  merit  function  on  a  regular  grid 
where  the  sample  spacing  is  taken  to  be  comparable  to  the  manufacturing  tolerance  on  such  a  thickness,  presumably 
somewhere  between  10‘^mm  and  10‘^mm.  In  this  way,  with  something  like  a  thousand  samples  ( and  certainly  no  mote 
than  one  hundred  thousand ),  it  is  possible  to  guarantee  that  all  minima  of  any  practical  significance  will  be  found. 


Assuming  that  the  cost  of  each  evaluation  of  the  merit  function  is  typical  of  lens  design  problems,  this  computation  is 
hardly  a  challenge  at  all  —  even  for  a  personal  computer.  Consequently,  the  global  minimum  can  be  guaranteed  in  such  a 
case.  However  this  problem  is  not  representative  of  real  design  problems;  the  key  issue  is  the  dimensionality. 


fix) 


Fig.  1  A  simple  one  dimensional  merit  function. 


Cbnsider  the  case  of  the  "monochromatic  quartet"  (  one  of  the  lens  design  problems  associated  with  this  conference ). 
This  system  has  four  elements  which  means  that  there  are  eight  surfaces  (  each  of  which  has  an  associated  curvature  and 
thickness )  and.  since  the  refractive  indices  ate  fixed,  there  is  a  total  of  sixteen  system  parameters.  In  fact  one  degree  of 
fr^eedom  is  lost  since  the  focal  length  is  required  to  take  a  specified  value  so  the  problem  is  seen  to  be  fifteen-dimensional. 
This,  of  course,  is  relatively  modest  since  the  refractive  indices  are  fixed,  all  surfaces  are  required  to  be  ^heres,  and  there  are 
only  four  elements.  It  is  clear  that  many  problems  in  lens  design  are  of  a  significantly  higho'  dimensionality.  Now 
consider  the  prospect  of  sampling  on  a  regular  grid  in  this  fifteen-dimensional  space.  Suppose  that,  just  to  get  started,  a 
grid  is  laid  out  with  <mly  ten  samples  in  each  dimensbn.  ( It  was  suggested  above  that  possibly  a  thousand  points  or  so 
would  be  more  typical  for  a  reasonable  guarantee. )  In  fifteen  dimensions,  this  coarse  grid  requires  sampling  at  10^^  points 
in  the  configuration  space.  Supposing  that  the  merit  function  can  be  evaluated  by  tracing  just  a  few  rays,  there  are  of  the 
OTder  of  10^®  - 10^^  ray-surfaces  to  be  traced  here.  Currently,  supercomputers  can  trace  no  more  than  about  10®  ray 
surfaces  per  second  so  this  job  would  require  at  least  10^1  seconds.  This  is  about  ten  thousand  years  —  and  we've  hardly 
scratched  the  surface  with  such  coarse  sampling.  Increasing  the  sampling  to  just  thirty  points  in  earii  dimension  pushes  the 
supercomputer  computatitm  time  for  this  modest  lens  design  problem  up  by  another  seven  orders  of  magnitude  to  far 
beyond  the  age  of  the  universe!  This  is  a  formidable  global  optimization  task  and  there  can  be  no  guarantees. 

Of  course,  the  lack  of  a  guarantee  need  not  proscribe  a  search.  It  is  clear,  however,  that  some  drastic  simplification  of 
the  problem  is  necessary.  The  most  common  approach  is  to  reduce  the  demands  placed  on  the  algorithm  and  design  it  to 
seek  any  local  minimum.  Most  optimization  algorithms  used  in  lens  design  simply  seek  a  local  minimum  in  the 
neighborhood  of  some  specified  starting  point  by  proceeding  downhill  until  some  fiat  terrain  is  found.  This  is  a  relatively 
straightforward  process  and  there  is  a  variety  of  algorithms  designed  for  this  purpose.  In  essence,  the  tough  component  of 
the  design  problem  now  becomes  finding  a  good  starting  point  fer  such  a  blinkered  algorithm  to  polish  up.  With  the 
prospect  of  individuals  having  workstations  on  their  desk  (  more  than  likely  sitting  idle  each  night ),  a  more  ambitious, 
automated  search  of  the  configuration  space  is  becoming  a  realistic  option.  In  many  practical  problems,  it  seems 
reasonable  to  expect  to  find  some  good  local  minima. 


This  particular  option  is  what  we  have  in  mind  when  we  speak  of  a  move  towards  global  optimization.  The  goal  of  our 
research  in  this  area  is  the  specification  of  an  algorithm  that,  speaking  loosely,  has  the  maximum  chance  of  finding  the  best 
minima  for  any  given  amount  of  computational  effort.  It  is  expected  that  the  chance  of  success  increases  with  increasing 
effort,  so  when  such  an  algorithm  is  invoked  the  user  must  provide  an  indication  (  even  if  only  a  relative  one )  as  to  how 
much  effort  is  to  be  expended  in  the  current  search.  The  algorithm  would  explore  the  region  of  interest  in  the  configuration 
space  in  search  of  good  minima  and,  given  si0icient  computational  effort,  would  undoubtedly  succeed  in  discovering 
minima  that  are  comparable  to  ( if  not  better  than )  those  uncovered  using  the  conventional  methods.  ( Notice  that  such  an 
algorithm  should  not  require  a  user-specified  starting  point — just  the  specification  of  a  region  to  explore. )  The  question 
that  immediately  comes  to  mind  is:  How  much  computational  effort  is  required  in  the  case  of  modest  lens  design 
problems?  In  1990  terms,  the  answer  is  expected  to  lie  somewhere  between  a  PC-day  and  a  Cray-month. 

A  related,  but  more  easily  answered  question  is:  How  much  success  can  be  had  in  the  context  of  lens  design  with  a 
reasonable  expenditure  of  computational  effort?  By  "reasonable",  it  is  meant  that  the  algorithm  should  terminate  within  a 
realistic  period  of  time,  say  a  couple  of  hours  or  even  a  couple  of  days,  when  running  on  a  medium  sized  computer,  say  one 
whose  cost  is  of  the  order  of  a  typical  annual  salary  of  an  engineer.  At  present,  the  price  may  be  around  $10K  -  $100K 
which  includes  machines  capable  MFLOPS  and  multiprocessii®.  It  is  clear  that  what  is  “reasonable"  by  this  definition  is 
sure  to  be  significantly  diffo-ent  in  just  five  years.  Consequently,  the  answer  to  this  question  is  forever  changing  and  we 
must  make  do  with  a  rough  cut  at  the  answer  for  1990.  To  this  end,  in  Section  4,  we  present  the  results  for  the  design  of  a 
monochromatic  quartet  derived  by  one  particular  global  optimization  algorithm. 


2  Introduction  to  simulated  annealing 

The  devebpment  of  gbbal  search  algorithms  has  proceeded,  in  some  cases,  by  making  analogies  with  models  varying 
from  biological  evolution  with  pairwise  rq)roduction  in  populaticms  subject  to  random  mutation,  through  to 
thermodynamics  and  the  configuration  of  an  alloy  during  cooling.  On  account  of  its  simplicity  and  elegance,  we  have 
chosen  the  general  approach  of  the  so-called  simulated  annealing  algorithm^  as  the  underlying  philosophy  upon  which  to 
base  our  investigation. 


The  simulated  annealing  algorithm  is  essentially  a  prescription  fca'  a  partially  random  walk  in  the  configuration  space. 

At  each  cycle  of  the  algorithm,  a  random  step  is  generated  and,  upon  evaluation  of  the  merit  function  at  the  associated  trial 
point,  a  decision  is  made  as  to  whether  to  accept  that  step  or  noL  A  more  specific  description  of  the  algorithm  involves 
two  entities.  The  first  parameter,  referred  to  as  the  temperature  and  written  here  as  T,  is  used  in  determining  whether  a 
preposed  step  is  to  be  accepted.  In  particular,  if  the  merit  function  is  reduced  as  a  result  of  taking  the  step,  the  step  is 
always  accepted.  On  the  other  hand,  if  the  merit  function  is  increased  by  an  amount  the  step  is  accepted  with  a 
probability  equal  to  .  (  One  way  to  realize  the  process  of  accepting  a  step  with  probability  p  is  to  generate  a  random 
number,  say  u,  uniformly  distributed  between  and  one  and  then  accept  the  step  if  u<p.)  That  is,  T  simply  sets  a 
scale  on  the  size  of  the  allowed  increases  in  the  merit  function  value  for  any  single  step.  The  probabUity  of  accepting  a 
step  fiom  the  point  with  coordinates  r  to  the  point  r',  say  A(r,  r*),  is  now  seen  to  be  given  by 

fl.  fin  <  fir). 


Mr,  r')  =  4 


^-p[/(rv/(r)]^  /(r')>/(r). 


(2.1) 


where  p  is  defined  by  p  :  =  \/T.  The  second  entity  is  a  probability  density  that  characterizes  the  distribution  of  the 
random  steps  generated  at  each  cycle  of  the  algorithm:  given  that  the  current  point  in  the  walk  about  the  configuration 
space  has  coordinates  r,  the  probability  that  a  generated  step  lands  within  a  volume  element  dz'  =  dx\  dx2  at  some 


point  r'  is  written  as  S(r,  f)  dt'.  Notice  that,  on  account  of  this  definition,  it  follows  that  J S(r,  r‘)dx'  =  \  te  all  r. 

In  these  teims,  the  heart  of  the  simulated  annealing  algorithm  takes  the  form  of  repeatedly  carrying  out  the  following  two 
operations: 

•  Generate  a  random  st^  firom  the  current  point  r  with  distribution  5(r,  f). 

•  If  fiy)  <f{r)  then  Aocept  the  step  (i.e.  r  is  replaced  by  the  value  of  r' ), 

else  Accept  the  stqi  with  probability 

Cycling  through  these  two  operations  essentially  generates  a  tour  about  the  configuration  space  and.  to  understand  the 
rationale  behind  this  algorithm,  it  is  important  to  characterize  the  manner  in  which  such  a  tour  occupies  the  space. 

To  this  end,  consider  defining  an  occupation  densi^  as  follows:  First  imagine  a  $et*ig>  consisting  of  a  host  of 
computers  all  carrying  out  this  guided  tour — each  one  starting  in  the  same  way  (  either  firan  a  specific  point  in  the  space 
or  firom  a  point  in  the  region  of  interest  chosen  at  random  in  the  same  way  on  earii  machine )  but  with  the  different 
computes  starting  their  random  number  generators  with  independent  seeds.  Suppose  that  all  the  computers  execute  m 
cycles  of  the  algorithm  given  above  and  now  define  the  occupation  density  pl’"l(r)  in  the  configuration  space  to 
characterize  the  distribution  of  the  final  posifions  arrived  at  by  each  of  the  computers.  That  is,  pl'"Kr)  dt  is  just  the 
firaction  of  the  points  within  the  volume  dt  at  point  r  and  it  follows  that  J  p^”'^(r)  dx  =  1 ,  for  all  m.  According  to  this 
definition,  characterizes  the  distribution  of  the  start  points,  pl^)(r)  characterizes  the  distribution  of  the  points  after 

one  cycle,  etc. 

It  is  now  straightforward  to  d^ve  an  explicit  expression  for  p(****l(r)  in  terms  of  p^"Kr): 

pf'"*^i(r)  =  p('"\r)  +  arrivals-dept2riures,  (2.2a) 

where 

arrivals  =  j  p^"\r')  S(r\  r)  A(r\  r)  di\ 

departures  -  J  p*"^'’)  S(r,  r')  A(r,  r')  dr'.  (2.2b) 

With  this  expression,  it  is  possible  to  investigate  the  evolution  of  the  occupation  density  and  explore  the  effect  of  different 
step  distributions ,  etc.  For  any  fixed  st^  distribution  and  temperature,  the  most  important  and  striking  result  is  that,  as  m 
becomes  large,  this  occupation  density  tends  to  a  fixed  distribution.  This  fixed  distribution,  called  the  equilibrium 
distribution  and  written  here  as  p^(r),  can  be  determined  by  simply  taking  pt^+’Kr)  *  p("l(r)  =  p^(r)  in  Eq.(2.2). 
Provided  the  step  distribution  is  chosen  to  be  symmetric  [  i.e.  5(r,  r’)  =  S(r',  r)  ]  and  T  is  non-zero,  the  resulting  equation 
is  readily  seai  to  be  satisfied  by  an  equilibrium  distribution  of  the  form 

(2.3) 

where  the  temperature  dependence  is  now  made  explicit  f<x  the  equililuium  density  and  A^(P)  is  simply  a  normalization 
factor  given  by 

N(P)  =  J  dx.  (2.4) 

(  As  may  be  expected,  there  are,  in  fact,  other  weak  conditions  on  the  form  of  the  merit  function  and  step  distribution  to 
attain  this  equilibrium  form  —  e.g.  that  they  be  not  too  pathological  —  but  they  are  of  little  significance  here. ) 

Notice  that  the  equilibrium  distribution  is  not  only  independent  of  the  step  distribution  but  it  is  also  independent  of  tte 
initial  occupation  density.  It  is  now  clear  that,  as  may  be  erqrected  for  such  a  partially  random  walk,  the  choice  of  start 
point  is  of  no  consequence.  Also  notice  that  if  p  is  set  equal  to  zero  ( i.e.  the  temperature  is  infinite  so  all  steps  are 


accepted  )  the  equilibrium  density  corresponds  to  uniform  occupation  of  the  configuration  space.  However,  when  p  is  non¬ 
zero  the  equilibrium  density  is  larger  where  the  merit  function  is  lower  and  it  is  easy  to  see  that,  as  p  becomes  large,  the 
equilibrium  density  corresponds  to  having  all  points  clustered  around  the  global  minimum.  The  rationale  behind  simulated 
annealing  now  becomes  clear  start  at  high  temperature  ( where  the  equilibrium  distribution  is  essentially  uniform  )  and 
cycle  while  gradually  reducing  the  temperature.  If  the  temperature  is  reduced  to  zero  infinitely  slowly^  the  occupation 
density  is  always  identical  to  the  associated  equilibrium  density  and  we  are  guaranteed  to  end  up  at  the  gbbal  minimum.  In 
practice,  of  course,  the  temperature  must  be  reduced  faster  than  this  and.  as  a  result,  the  global  minimum  can  not  be 
guaranteed. 

It  is  now  clear  that  there  are  two  crucial  aspects  in  any  implementation  of  the  simulated  annealing  algmthm.  The  first 
is  to  determine  how  to  control  the  temperature  —  there  are  surely  stages  when  it  would  be  advantageous  to  reduce  the 
temperature  relatively  slowly.  Similarly,  the  optimal  step  distribution  must  be  determined  and  it  can  be  expected  that  this 
will  change  with  temperature.  In  earlier  work,  the  temperature  control  ( both  the  initial  value  and  the  reduction  schedule  ) 
and  the  step  generation  components  of  the  algmthm  have  often  been  simplistic  and  governed  by  a  multiplicity  of  unrelated 
ad  hoc  prescriptions.  For  example,  the  temperature  may  be  reduced  by  a  specific  ftaction  every  N  steps*  where  N  is 
predetermined,  and  the  steps  chosen  to  uniformly  fill  a  rectangular  parallelepiped  aligned  to  the  coordinate  axes  and  scaled 
such  that  roughly  50%  of  the  trial  steps  are  accepted.^  Others  have  proposed  coupling  the  parameter  that  is  referred  to  here 
as  temperature  directly  to  the  merit  function  value.^  This  radical  proposal  may  prove  to  be  effective,  but  it  breaks  the 
principal  link  in  the  analogy  with  the  physical  annealing  process  ( i.e.  the  equilibrium  distribution  discussed  above  has  no 
relevance  to  this,  so-called,  generalized  simulated  annealing  algorithm  ).  We  have  taken  a  different  path  and  adopted  a 
unifying,  simple  heuristic  principle  that  serves  as  a  guide  in  the  design  of  every  aspect  of  the  algorithm  and  allows  for 
harmony  in  a  completely  adaptive  algorithm.  The  resulting  algorithm,  which  we  refer  to  as  adaptive  simulated  annealing, 
is  also  designed  such  that  its  performance  is  unchanged  by  linear  transformations  of  the  coordinates. 

3  Adaptive  simulated  annealing 

It  is  intuitive  that  there  are  temperature  schedules  and  step  generation  procedures  that  are  in  some  sense  optimal  for 
annealing.  For  example,  it  is  now  apparent  that  the  parameter  referred  to  as  temperature  determines  the  region  of  spread  of 
the  occupation  density  in  the  configuration  space  which,  in  turn,  can  be  expected  to  set  the  scale  for  the  step  distribution. 

At  any  particular  temperature,  it  is  evident  that  if  the  generated  steps  are  too  small,  the  occupation  density  will  be  slow  to 
change  and  therefore  equilibration  will  also  be  slowed.  Conversely,  if  the  generated  steps  are  too  big.  the  vast  majority  will 
be  rejected  so  that,  once  again,  equilibration  will  be  slowed.  Similarly,  there  will  be  times  when  it  will  be  expedient  to 
reduce  the  temperature  more  rapidly.  The  key  issue  here  is:  Just  how  is  the  worth  of  a  proposed  prescription  for  step 
generation  or  temperature  control  measured?  Ideally,  of  course,  the  probability  of  winding  up  at  the  global  minimum 
should  be  maximized  ( i.e.  the  higher  this  probability,  the  better  the  prescription ).  However,  we  have  found  criteria  like 
this  to  be  unworkable  so  we  have  tried  a  simpler  alternative. 

As  a  means  to  define  a  measure  of  the  effectiveness  of  any  proposed  annealing  algorithm,  we  have  introduced  the 
following  heuristic  principle:  At  every  stage  of  the  annealing  process,  the  occupation  density  should  be  kept  as 
close  as  possible  to  the  equilibrium  density.  That  is,  the  smaller  the  maximum  distance  from  equilibrium,  the  bettor.  On 
a  moment’s  reflection,  it  can  be  appreciated  that  the  traditional  ’'staircase”  temperature  schedule  and  the  associated  concept  of 
cycling  at  each  fixed  temperature  value  "until  equilibrium  is  attained”  must  be  rejected  here.  In  order  to  apply  this  heuristic, 
however,  it  is  necessary  to  introduce  a  measure  of  the  distance  between  two  distributions.  There  are  a  number  of  reasonable 
ways  to  do  this  and  we  have  adopted  a  measure  of  distance  such  that  the  distance  between  two  occupation  densities,  say 


p^(r)  and  pj,(r),  is  given  by 


?<►(»•)}  =  tJ  |P«('’)-P»(»’)|<*-  (31) 

With  this  definition,  since  all  densities  are  everywhoe  positive  and  integrate  to  one  by  definition,  it  follows  that  distance  is 
always  between  zero  ( complete  coincidence  )  and  one  ( nothing  in  common  ).  If  p*  (r)  is  taken  to  be  the  desired 
equilibrium  density,  this  distance  can  be  loosely  intopreted  as  a  measure  of  the  fraction  of  the  cases  for  which  the  walk 
characterized  by  Pa(f)  is  "out  of  place". 

Consider  a  system  where  the  temperature  is  changed  by  a  small  amount  ( "smaU"  is  given  a  more  definite  meaning  in  a 
moment )  at  every  step.  If  at  the  m'th  cycle  the  reciprocal  temperature  is  the  heuristic  can  now  be  stated  more 
specifically  as: 

®{pl"l(r),  p,^(r;  p„)}  <  E,  for  all  m,  (3.2) 

where  e  is  some  fixed  parameter  to  be  specified  as  input  to  the  algorithm.  It  can  be  expected  that  smaller  values  of  e  will 
require  more  computation  for  any  given  annealing  problem.  That  is,  e  is  a  relative  measure  of  how  exhaustively  the 
algorithm  will  explore  the  configuration  space. 


With  these  definitions  and  the  idea  of  small  temperature  changes  at  each  step,  it  seems  natural  to  introduce  a  measure  of 
the  rate  of  change  of  the  equilibrium  distribution  with  change  in  temperature,  i.e.  a  "sensitivity",  as  follows: 


S  := 


lim 


p^(r;p  +  Ap)] 

AP 


(3.3) 


It  can  be  anticipated  that  it  will  be  appropriate  to  cool  more  slowly  when  the  value  of  this  sensitivity  parameter  s  becomes 
larger.  To  quantify  such  an  expectation,  it  is  necessary  to  also  have  some  measure  of  how  much  each  cycle  reduces  the 
distance  to  equilibrium.  Accordingly,  define  a  reduction  factor  ^by 

o{p‘'"'(»’).  P.,(r;p  +  Ap))  ’  ^  ^ 

where  it  is  to  be  assumed  that  pf"^(r)  is  equal  to  p«^  (r;P)  and  the  reciprocal  temperature  for  the  current  step  is  equal  to 
p+Ap.  In  fact,  p^"^  (r)  will  not  be  equal  to  p^  (r;  p)  in  practice,  however,  it  seems  reasonable  to  expect  that 
pt"]  p  +  jg  roughly  equal  to  y{  P«^  (^.  P)  -  P««  (^J  P  +  Ap)  } ,  for  some  constant  y  ( presumably  greater 

thanone).  With  this  more  realistic  form  for  p^"^(r),  i.e.  p*'"^(r)  =  y  p,^(r;p)  +  (l-y)  p,^(r;p+ Ap),  itisreadily 
shown  that,  as  a  consequence  of  homogeneity,  the  resulting  value  of  the  reduction  factor  is  identical  to  the  simpler  form 
defined  above,  for  any  non-zero  value  of  y. 


If  each  cycle  of  the  annealing  process  is  successfully  drawing  the  occupation  density  toward  the  equilibrium  d^sity,  the 
reduction  factor  defined  above  must  take  values  between  zero  ( instant  equilibration  )  and  one  ( infinite  time  to  equilibrate  ). 
It  is  now  {Opposed  that  the  step  distribution  should  be  chosen  to  minimize  the  value  of  ^at  each  stagp  of  the  process.  That 
is,  equilibration  should  be  as  rapid  as  possible  —  in  keeping  with  the  heuristic  and  the  desire  to  reduce  the  temperature  as 
quiddy  as  possible.  This  means  that  there  is  now  a  well  defined  criterion  for  assessing  step  generation  algcnithms. 
Furthermore,  the  reduction  factor,  the  sensitivity,  and  the  statement  of  the  heuristic  given  in  Eq.(3 .2)  can  be  used  together 
to  deduce  that,  at  each  cycle,  the  reciprocal  temperature  should  be  increased  by  no  more  than 


Ap  =  i(L_2:). 

S 


(3.5) 


This  follows  simply  from  the  fact  that  the  equilibrium  distribution  moves  a  distance  5 AP  when  the  reciprocal  temperature 
is  changed  by  Ap  and  that,  assuming  the  distance  to  equilibrium  was  about  e,  one  cycle  of  the  algorithm  will  reduce  this 
distance  by  a  factor  of  ^  The  skeleton  of  an  adaptive  algorithm  is  now  apparent:  start  with  p  set  equal  to  zero,  generate 
steps  chosen  to  minimize  ^and,  at  each  cycle,  change  the  lonperature  by  the  value  specified  in  Eq.(3.5).  There  is  only  one 
missing  piece  at  this  stage:  a  means  to  estimate  :^and  5  ''on  the  fly". 


It  is  a  simple  exercise  to  expand  the  form  defining  5m  Eq.(3.3)  as  a  Taylor  series  in  Ap,  use  the  definition  of  distance 
given  in  Eq.(3. 1),  and  the  form  of  the  equilibrium  distribution  specified  in  Eq.(2.3)  in  order  to  find  the  following  exact 
result: 


(3.6) 


where  <  /  >  is  defined  by 

</>  :=  Jp,,('-;P)/(OA.  (3.7) 


Simflarly,  Eq.(3.4)  can  be  reduced  to 

J  P^(r;  P)  !/(»•)-  <  /  >  +J (/(»•’) -  /(r)]S(r. r’)A(r.r')dr- 


^  = 


I  p„('';P)  [/(»•)-</ 


dx 


(3.8) 


The  key  observation  here  is  that  each  of  these  three  expressions  can  be  estimated  during  the  annealing  process  although  this 
requires  some  explanation. 


Notice  that  all  but  one  of  the  integrals  appearing  in  these  equations  take  the  form  of  an  integral  over  the  configuration 
space  of  a  function  of  the  value  of  the  merit  function  mult^lied  by  the  equilibrium  density.  Integrals  like  this  can  be 
readily  estimated  if  the  walk  is  ergodic:  So  far,  the  occupation  and  equilitnium  densities  have  been  defined  by  reference  to  a 
host  of  independent  runs  of  the  algorithm.  Now  consider  a  single  computer  executing  one  such  walk  about  the 
configuration  space  at  fixed  temperature.  The  walk  is  said  to  be  ergodic  if  the  occupation  density  defined  in  terms  of  the 
recorded  history  of  the  current  point  for  this  one  system  tends  to  the  same  equilibrium  density  defined  above.  It  is  expected 
that  [  under  weak  conditions  on  the  merit  function  and  step  distribution  like  those  brushed  aside  following  £q.(2.4)  ]  this 
will  be  so  here  —  all  that  has  happened  is  that  the  observations  fiom  many  separate  runs  have  been  replaced  by  many 
observations  from  the  one  run  and  remember  that  the  random  aspect  scrambles  any  infeamation  about  the  past  for  each  such 
walk.  That  is,  where  the  walk  wilt  end  up  after  N  more  stq)s  is  essentially  independent  cf  the  current  point  provided  N  is 
large  enough.  This  being  so,  an  integral  like  the  one  appearing  in  the  definition  of  <  /  >  is  readily  evaluated  by  simply 
averaging  over  the  current  value  of  the  merit  function  at  each  cycle  of  the  walk.  Since  the  temperature  is  gradually 
changing,  it  is  necessary  to  average  only  over  a  set  of  the  most  recent  values.  This  can  be  done  conveniently  by  using 
exponentially  decaying  weights  in  the  averaging  process.  A  similar  approach  can  be  taken  to  the  estimation  of  ^and  S 
which,  as  indicated  above,  form  the  basis  of  the  adaptive  control  in  our  simulated  annealing  algorithm. 


A  significant  simplification  in  the  step  generation  process  follows  if  the  covariance  matrix  of  the  generated  steps  is 
required  to  be  proportional  to  the  covariance  of  either  the  acc^ted  steps  or  the  equilibrium  distribution.  This  virtually 
^sures  that  the  algorithm  is  invariant  under  linear  changes  in  the  coordinate  system.  Th^e  is  a  number  of  ways  to  achieve 
this  and  we  have  chosen  to  use  an  n-dimensional  Gaussian  distribution  whcee  covariance  is  coupled  to  the  accepted  steps. 
This  is  realized  by  relying  on  the  central  limit  theorem  and  simply  forming  new  steps  by  making  random  linear 
combinations  of  previously  accepted  steps  which  leaves  only  the  overall  scale  factor  free.  This  scale  factor  is  controlled  in 
a  fashion  derived  firom  the  requirement  that  !^be  minimized.  The  remaining  details  of  the  adaptive  simulated  annealing 
algorithm  are  to  be  presented  elsewhere.  In  closing,  it  is  mentioned  that,  in  terms  of  the  statistics  gathering  [  especially  for 


the  innennost  integral  in  Eq.(3.8)  ]  and  reducing  the  overall  run  time,  there  are  significant  advantages  if  the  algorithm  is 
implemented  in  a  multiprocessing  environment  so  that,  from  each  currm  point,  multiple  random  steps  can  be  investigated 
at  the  one  time. 

4  Preliminary  results  for  the  "monochromatic  quartet" 

In  the  implementation  of  any  global  optimization  process  for  lots  design,  there  are  several  important  issues  that  must 
be  dealt  with  at  the  outset.  First,  since  many  thousands  —  perhaps  millions  —  of  optical  systems  will  be  assessed  in  the 
process,  it  is  vital  that  the  evaluation  of  the  m^t  function  be  done  as  efficiently  as  possible.  For  a  problem  like  the 
monochromatic  quartet,  using  Gaussian  quadrature^  is  surely  the  most  effective  approach.  Second,  for  random  walks  in  the 
configuration  space  like  the  type  proposed  here,  it  is  crucial  to  explicitly  enforce  as  many  of  the  constraints  as  possible. 

For  example,  choosing  a  set  of  curvatures  and  thidoiesses  at  random  is  virtually  guaranteed  to  give  an  impractical  system 
where  edge  thicknesses  may  be  negative  cr  there  may  be  such  a  high  degree  of  vignetting  that  essentially  no  fight  gets 
through.  Accordingly,  we  have  chos^  variables  which  guarantee  that  almost  aU  of  the  goierated  systems  are  feasible.  For 
instance,  to  avoid  vignetting  problems,  the  surface  curvatures  may  be  defined  with  respect  to  the  (  first  orda )  unvignetted 
^>erture  radius,  say  A,  in  the  associated  plane,  etc.  In  the  interest  of  efficiency  and  fair  sampling,  it  is  also  crucial  to  devise 
a  ( linear  coordinate  transformation  independent )  procedure  for  reflecting  wayward  steps  off  violated  constraints  to  force  new 
trial  points  to  fie  always  within  the  region  of  interest.  For  instance,  if  a  generated  stq>  causes  an  element  to  have  negative 
thickness  on  axis,  it  is  apprq)riate  to  add  a  component  proportional  to  the  gradient  of  the  constraint  (  with  respect  to  the 
natural  metric  associated  with  the  accepted  steps )  in  ord»  to  reflect  this  stq)  back  into  the  region  of  interest. 

Another  important  point  to  realize  is  that  non-linear  transformations  of  the  variables  ( e.g.  using  surface  radii  in  place  of 
the  curvatures )  fundamentally  changes  the  optimization  problem  —  a  totally  random  walk  in  the  configuration  space  will 
have  an  entirely  different  occupation  density  for  different  sets  of  coordinates.  Furthermore,  using  geographic  terminology, 
the  merit  function  terrain  may  be  smooth  and  rolling  with  wide  valleys  surrounding  the  lowest  minima  in  one  set  of 
coordinates,  while  this  same  problem  may  have  steep  curved  gulleys  with  the  lowest  minima  hidden  as  "sink  holes  up  in 
the  hills"  when  viewed  using  another  set  of  coordinates.  Clearly,  a  global  optimize  would  have  a  significantly  betta 
chance  of  finding  the  global  minimum  with  the  former  coordinate  set  Furthermore,  since  the  global  minimum  of /(r)  is 
also  the  global  minimum  of  any  function  of  the  form  Af  [ffr)]  provided  M  is  a  monotonic,  increasing  function,  there  is 
another  set  of  transformations  of  a  particular  problem  that  must  be  ctmsido'ed.  It  is  crucial  to  realize  that  the  equilibrium 
densities  for  the  annealing  algorithm  are  fundamentally  different  ^en  a  non-linear  transformation  of  this  type  is  aj^fied  to 
the  merit  function.  Fra  example,  is  it  better  to  optimize  using  a  mean  square  critraiem,  a  root  mean  square  criterion  or 
some  other  transformation  of  the  merit  function? 

We  have  not  had  the  opportunity  to  explore  coordinate  transformations  or  merit  function  transfoimations  in  any 
systematic  way  since  we  are  sdll  in  the  process  of  fine-tuning  tl^  free  parameters  that  remain  in  the  adaptive  simulated 
annealing  algorithm.  Furthermore,  we  have  not  explored  the  option  of  incorporating  a  regular  downhill  optimizer  which 
may  be  invoked  to  start  at  any  promising  int^mediate  configuratirats.  Ideally  all  these  considerations  should  be  combined 
into  the  one  investigation  since,  no  doubt,  there  are  interdependencies  here.  Having  done  all  this,  it  would  then  prahaps  be 
clear  whether  it  is  best  to  have  a  small  number  of  extensive  runs  (  with  the  value  of  e  taken  to  be  as  small  as  possible 
while  remaining  within  the  constraints  imposed  by  a  limited  computational  budget )  or  many  quicker,  independent  runs. 

We  have  implemented  adaptive  simulated  annealing  on  an  accelerated  Macintosh  II.  The  computer  is  capable  of 
multiprocessing  since  it  incorporates  eight  "transputer”  processors  (  made  by  Inmos )  giving  us  a  throughput  comparable  to 


the  newly  released  RISC  workstations  that  are  available  from  a  variety  of  computer  fims.  The  option  of  parallel 
processing  suits  this  application  well.  In  order  to  interpret  the  results  of  the  algorithm  in  the  case  of  the  monochromatic 
quartet,  we  began  with  the  same  inoblem  except  the  number  of  elements  was  first  reduced  to  one  and  then  increased  to  two 
and  three.  That  is,  the  focal  length,  f-number,  field  angle,  merit  function,  etc  were  all  unchanged.  We  executed  a  number 
of  runs  on  each  of  these  simpler  inoblems  and  fairly  consistently  came  up  with  the  three  best  systems  shown  in  Fig.  2. 
These  results  were  quite  encouraging  and  provided  us  with  a  scale  for  the  results  in  the  four  element  case:  a  quartet  with  a 
spot  size  in  excess  of  1  S|xm  can  be  regarded  as  a  poor  solution.  But  what  is  the  mark  of  a  good  solution?  A  sample  of  the 
results  we  found  for  the  monochromatic  quartet  are  presented  in  Fig.  3.  These  were  evaluated  in  overnight  runs  where  the 
cooling  parameter  e  was  set  so  that  each  night  would  allow  up  to  about  six  separate  attempts  at  the  problem  —  a  new  run 
starting  when  the  preceding  run  had  reduced  the  temperature  to  a  level  that  the  associated  changes  in  the  merit  function  were 
no  long^  significant  There  was  a  wide  variety  of  solutions  with  many  interesting  forms  giving  spot  sizes  in  excess  of 
1  Spm.  It  turns  out  that  the  better  solutions  presented  here  are  within  a  factor  of  two  of  the  best  solutions  reported  at  the 
conference  and  this  is  an  encouraging  result  considering  the  fact  that  there  is  much  left  for  us  to  investigate  in  this  context 

5  Concluding  remarks 

There  are  many  aspects  of  the  annealing  process  which  we  are  yet  to  explore  fully  in  the  context  of  lens  design.  Most 
important  among  these  is  the  investigation  of  the  significance  of  specific  transformations  of  the  lens  design  merit  functions 
and  coordinates  (  which  include  alternative  schemes  for  enforcing  such  things  as  focal  length  constraints  ).  In  a  broader 
context,  we  are  yet  to  consider  the  incorporation  of  downhill  optimizers.  Even  so,  the  early  results  are  very  encouraging. 
The  three  element  system  found  reliably  by  adaptive  simulated  annealing  was  confirmed  in  a  personal  communicaticm  from 
David  Shafer  ( the  creator  of  the  monochromatic  quartet  problem  )  to  coincide  with  the  best  solution  known  to  him.  A 
couple  of  the  monochromatic  quartet  solutions  are  within  about  a  factor  of  two  ( of  the  2|im  spot  size  )  of  the  best  systems 
reported  at  this  meeting.  The  variety  of  different  systems  with  spot  sizes  around  l0-20)im  that  was  found  by  annealing 
suggests  that  lens  design  problems  are  sure  to  offer  an  attractive  challenge  to  any  advocates  of  particular  global 
optimization  algorithms.  There  is  a  wide  variety  of  algorithms  that  have  been  proposed  for  global  optimization  and  there 
are  many  aspects  to  be  addressed  in  their  successful  implementation  in  any  given  context  We  would  be  ddighted  if,  as  in 
years  gone  by,  the  fidd  of  lens  design  becomes  the  arena  fw  the  development  of  state-of-the-art  optimization  algorithms. 
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